Improved Multimodal Deep Learning with Variation of Information

نویسندگان

Kihyuk Sohn

Wenling Shang

Honglak Lee

چکیده

Deep learning has been successfully applied to multimodal representation learning problems, with a common strategy to learning joint representations that are shared across multiple modalities on top of layers of modality-specific networks. Nonetheless, there still remains a question how to learn a good association between data modalities; in particular, a good generative model of multimodal data should be able to reason about missing data modality given the rest of data modalities. In this paper, we propose a novel multimodal representation learning framework that explicitly aims this goal. Rather than learning with maximum likelihood, we train the model to minimize the variation of information. We provide a theoretical insight why the proposed learning objective is sufficient to estimate the data-generating joint distribution of multimodal data. We apply our method to restricted Boltzmann machines and introduce learning methods based on contrastive divergence and multi-prediction training. In addition, we extend to deep networks with recurrent encoding structure to finetune the whole network. In experiments, we demonstrate the state-of-the-art visual recognition performance on MIR-Flickr database and PASCAL VOC 2007 database with and without text features.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Modeling Multimodal Clues in a Hybrid Deep Learning Framework for Video Classification

Videos are inherently multimodal. This paper studies the problem of how to fully exploit the abundant multimodal clues for improved video categorization. We introduce a hybrid deep learning framework that integrates useful clues from multiple modalities, including static spatial appearance information, motion patterns within a short time window, audio information as well as long-range temporal ...

متن کامل

Extracting Visual Knowledge from the Web with Multimodal Learning

We consider the problem of automatically extracting visual objects from web images. Despite the extraordinary advancement in deep learning, visual object detection remains a challenging task. To overcome the deficiency of pure visual techniques, we propose to make use of meta text surrounding images on the Web for enhanced detection accuracy. In this paper we present a multimodal learning algor...

متن کامل

Multimodal Learning with Deep Boltzmann Machines

A Deep Boltzmann Machine is described for learning a generative model of data that consists of multiple and diverse input modalities. The model can be used to extract a unified representation that fuses modalities together. We find that this representation is useful for classification and information retrieval tasks. The model works by learning a probability density over the space of multimodal...

متن کامل

Multimodal Deep Learning for Cervical Dysplasia Diagnosis

To improve the diagnostic accuracy of cervical dysplasia, it is important to fuse multimodal information collected during a patient’s screening visit. However, current multimodal frameworks suffer from low sensitivity at high specificity levels, due to their limitations in learning correlations among highly heterogeneous modalities. In this paper, we design a deep learning framework for cervica...

متن کامل

Deep learning: from speech recognition to language and multimodal processing

APSIPA Transactions on Signal and Information Processing / Volume 5 / 2016 / e1 DOI: 10.1017/atsip.2015.22, Published online: 19 January 2016 Link to this article: http://journals.cambridge.org/abstract_S2048770315000220 How to cite this article: Li Deng (2016). Deep learning: from speech recognition to language and multimodal processing. APSIPA Transactions on Signal and Information Processing...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2014

Improved Multimodal Deep Learning with Variation of Information

نویسندگان

چکیده

منابع مشابه

Modeling Multimodal Clues in a Hybrid Deep Learning Framework for Video Classification

Extracting Visual Knowledge from the Web with Multimodal Learning

Multimodal Learning with Deep Boltzmann Machines

Multimodal Deep Learning for Cervical Dysplasia Diagnosis

Deep learning: from speech recognition to language and multimodal processing

عنوان ژورنال:

اشتراک گذاری